National Repository of Grey Literature 53 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Unsupervised Evaluation of Speaker Recognition System
Odehnal, Ondřej ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
Tato práce je vystavěna nad moderním systémem pro rozpoznávání mluvčího (SID) založeného na x-vektorech. Cílem bakalářské práce je navrhnout a experimentálně vyhodnotit techniky pro evaluaci SID systému za použití audio nahrávek bez anotace tj. bez znalosti mluvčího. Pro tento účel je z každé nahrávky bez anotace vytvořen embedding. Ty se poté používají pro shlukování nahrávek a následné vytvoření pseudo-anotací. Na těchto anotacích se SID systém evaluuje pomocí equal error rate (EER) metriky. Za účelem vytvoření pseudo-anotací byly navrženy tyto shlukovací algoritmy učení bez učitele: K-means, Gaussian mixture models (GMM) a aglomerativní shlukování. Po testování vyšel jakožto nejlepší experimentální postup K-means se Silhouette metrikou, která používá kosinovou podobnost jako míru vzdálenosti. Nejlepší metoda dosáhla 5,72 % EER s referenčním EER = 5,15 %, které bylo spočítané se znalostí anotace na části datasetu SITW dev-core-core. Podobné výsledky byly získány na části datasetu SITW eval-core-core s odhadnutým EER = 5,86 % a referenčním 5,08 %. Rozdíl mezi hodnotami tvoří 0,57 % pro eval-core-core a 0, 78% pro dev-core-core. Další testy na NIST SRE16 a VoxCeleb1 datasetech byly provedeny za účelem ověření správnosti navrženého postupu. Obecně se dá říct, že navržený testovací postup měl chybu přibližně 1 %, což je poměrně dobrý výsledek pro algoritmus učení bez učitele.
Speech-signal-based recognition of type of transmission channel
Kopřiva, Tomáš ; Burget, Radim (referee) ; Atassi, Hicham (advisor)
This work deals with the classification of five different transmission channels by speech signal processing. The channels considered are: GSM, two PSTN channels and two VoIP channels. For the training and testing purposes, a speech database for the transmission channels called SPLAB_TranCh was constructed. The speech signals of this corpus originally come from well-known TIMIT database, where each utterance passed through each mentioned transmission channel. The main objective of this work is to find optimal features and classification accuracy that yield best classification accuracy. Several types of features, including MFCC, LPCC and spectral characteristics were put under examination. The best suprasegmental features were identified by using mRMR algorithm. Several classifiers were tested as well. The results suggested that the classification of transmission channel can be performed with high accuracy (around 90 %). Influence of adverse effects, which can occur during transmission, is also examined. Considered types of distortions are: saturation, thresholding, echo, crackling noises and different colors of noises and filters.
Voice Activity Detection
Břenek, Roman ; Grézl, František (referee) ; Matějka, Pavel (advisor)
This thesis describes techniques for voice activity detection in audio recordings. It is necessary to  correctly classify all non-speech segments and recognize speech with noisy background.  The whole process of voice activity detection (VAD) is described in this thesis, i.e. digitizing audio  signal, feature extraction, training of the system, post-processing and final evaluation. There are  three different systems compared within the thesis . The first one is based on phoneme recognition using neural network, the other two are variations of Gaussian Mixture Models (GMM). Each system was tested on three data sets - Tactical Speaker Identification Speech Corpus (TSID), Ham Radio (HR) and Rich Transcription Evaluation (RT05-RT07). The best results of each system are compared with the results of the third side.
Speech Recognition For Selected Languages
Schmitt, Jan ; Karafiát, Martin (referee) ; Janda, Miloš (advisor)
This bachelor's thesis deals with recognition of continues speech for three languages - Bulgarian, Croatian and Swedish. There are described basics of speech processing and recognition methods like acoustic modeling using hidden Markov models and gaussian mixture models. Another aim of this work is preparing data for those languages from GlobalPhone database, so they may be used with speech recognition toolkits Kaldi and HTK. With data prepared there are several models trained and tested using Kaldi toolkit.
Emotional State Recognition Based on Speech Signal Analysis
Čermák, Jan ; Atassi, Hicham (referee) ; Smékal, Zdeněk (advisor)
The thesis is focused on the emotional states classification in the Matlab program, using neural networks and the classifier which is based on a combination of Gaussian density functions. It deals with the speech signal processing; the prosodic and spectral signs and the MFCC coefficients were extracted from the signal. The work also deals with the quality evaluation of individual signs of which the most suitable were chosen in order to provide the correct classification of emotional states. In order to identify the emotional states, two different methods were used. The first method of classification was the use of neural networks with differently selected parameters, and the second method was the use of the Gaussian mixture model (GMM). In both methods, a database of emotional utterances was divided into the training group and the test group. The testing was based on a method independent of the speaker. The work also includes the comparison of individual analyzed methods as well as the representation and comparison of the results. The conclusion comprises a proposition for the best parameters and the best classifier for the recognition of the speaker’s emotional state.
Smoke and Fire Detection in Video Sequences
Tomek, Peter ; Štancl, Vít (referee) ; Švub, Miroslav (advisor)
This thesis aims to analyse a videosignal given in input and find segments that contains fire or smoke. The problem is divided into two cases-detection of fire and detection of smoke. The first and main step of analysation is detection of segments by Gaussian mixture model that is trained by Expectation-Maximization algorithm, or shortly EM algorithm. For smoke detection is than used method of optical flow. The final segments are than processed by some morphological methods and determination of their position is made. Finally, the output of algorithm is again a videosignal in which segments that probably contains fire or smoke are highlighted.
Retinal blood vessel segmentation in fundus images via statistical-based methods
Šolc, Radek ; Walek, Petr (referee) ; Odstrčilík, Jan (advisor)
This diploma thesis deals with segmentation of blood vessel from images acquired by fundus camera. The characteristic of fundus images and current methods of segmentation are described in theoretical part. The reach of the practical part is method using statistical model. The model using Student´s distribution for automatic segmentation is gradually drafted. Firstly EM- algorithm has been incorporated and model drafted on Markov random fields for improving robustness to noise after that. Contrast of thin blood vessel is improved in image preprocessing part by discrete wave transformation. The output image is used as mask for grayscale intensity decrease of thinnest blood-vessel and intensity increase of background. Whole model was programed in Matlab. The model was tested on whole HRF database. The quantitative evaluation of binary images were compared with golden standard images.
Neural networks in speaker classification
Svoboda, Libor ; Atassi, Hicham (referee) ; Míča, Ivan (advisor)
The content of this work is focused on the neural network per speaker recognition. The work deals with problems of processing speech signal and there are introduction some types of neural network. The part of work was made database of records from speakers with have various sex and ages. The train and test group was made from the database. For classifier were suggested afterwards. One of them was nominated on base Gaussian mixture model and three of them were nominated on neural. This system was tested and analyzed on the basis of age, gender and both criterions each other at the end. Attention is focused on choice suitable feature in each mission of classification at the same time. At the end of work are introduced results of analysis for individual groups and features. The most suitable features are diagnosed from given mission of classification and the most prosperous classifier.
Voice Conversion
Brukner, Jan ; Plchot, Oldřich (referee) ; Černocký, Jan (advisor)
Thesis deals with voice converion. Method, where we want to modify speech parameters of source speaker into that of a target speaker. At the beginning of thesis is described Voice Conversion Challenge (VCC), where participants tried to build better voice conversion systems. In the next part are analysed components of baseline system used in VCC. Modifications which could improve quality of converted voice are proposed. Then is briefly described implementation if these modifications and results are analysed. In the end is part dedicated to further improvements of voice conversion.
Acoustic signal classification
Pospíšil, Aleš ; Balík, Miroslav (referee) ; Atassi, Hicham (advisor)
Bachelor's thesis is focused on automatic music genre classication. First part of work evaluates present situation and refer to published studies. Gained knowledge from there is applied in this work. In terms of nding solution for problem the work summarize and describe suitable music features and classication techniques like neural networks and k-nearest neighbor. Four selected classication classes were classical, electro, jazz and rock music. Result of work is user-friendly system that provides automatic music genre recognition. Achieved classication performance is more less comparable to human music genres recognition.

National Repository of Grey Literature : 53 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.